VV2.DOC[VV,BGB] - www.SailDart.org

perm filename VV2.DOC[VV,BGB] blob sn#135772 filedate 1974-12-17 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00014 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	{FDJC} VERIFICATION VISION
C00004 00003	INTRODUCTION
C00007 00004	(programmable assembly system).
C00010 00005	2. Simple Example
C00012 00006	DISCLAIMER:
C00013 00007	GOAL: INSERT THE BOLT(S) INTO THE BRAKE SUBASSEMBLY
C00016 00008	"TASK" 
C00019 00009	Proposed task continued
C00035 00010	ORDER OF IDEAS?
C00037 00011
C00038 00012	Introduction
C00040 00013	VV SYSTEM Organization (vision in general ... some related work...)
C00047 00014	INTRODUCTION
C00052 ENDMK
C⊗;
{FDJC} VERIFICATION VISION

{FAJC} Bruce G. Baumgart
{JC} Robert C. Bolles

	I. Introductions
		1. Introduction
		2. Simple Example
		3. Preliminary level of image anaylsis
	II. Elements of Verification Vision.
		4. Prediction.
		5. Comparison.
		6. Correction.
	III. Elements of Vision Representation
		7. 2-D image representations: video, regional & feature.
		8. 3-D model representations...
		9. Task representations: procedures,
	IV. Applications of Verification Vision.
		10. Robot Factory Worker Task.
		11. Robot Chauffer Task.
		12. The Blocks World Task.
	IV. Conclusions
		13. Future Work - Recognition
		14. Conclusion.
		15. References.
---------------------------------------------------------------------
	BGB:  	introduction
		overall system ... history of others
		PREDICTION
		CORRECTION
		application ... cart or other
		conclusion

	RCB:	introduction
		overall system ... diag, grand scheme
		the task description
		TRAINING
		COMPARISON
		application ... more automation
		conclusion
INTRODUCTION

	Verification vision  is the  process of synchronizing  visual
prediction  with   visual  perception.    Such  verification  may  be
performed on several levels of abstraction: 2-D images, 3-D models as
well as on semantic descriptions.   However in our recent work on the
development  of vision for  a robot factory  worker, predicted images
can be obtained which  are nearly identical to the  perceived images.
In such  a case, the verification  is done for the  sake of measuring
small geometric differences which are  expected but which can not  be
rapidly measured by other means.  That is the identities of the image
elements are not in question, only their precise relative positions.  
	
	Verification vision  also includes "hypothesis  and test" where
a predicted line at  a certain span of
location, orientation and contrast is compared
with a line from a perceived image
as in [FALK] and [SHIRAI].

	Verification vision also includes narrow angle
correlation stereo. In this case the "prediction" is another image of  the
same objects,  but taken from  a slightly different  relative position. 
The goal is  to locate matching "features" (such as correlation patches)
in order to provide the stereo  package with two positions for the  same
part of the scene (see [Thomas]).   Notice, as mentioned above, that the
identities of the models (ie. the line and correlation patch) are not in
question; only their positions in the actual image. 
(programmable assembly system).

  Such systems provide
complex, but predictable environments consisting of objects with curved,
textured surfaces.  There have been a few special-purpose programs which
perform  verification vision  tasks within  such environments (eg.   see
[BOLLES], [ROSSOL],  ...), but  there have been  no generalized  systems
which predict  and locate curved objects.   Garvey and Agin  at SRI have
each set up systems which deal  with real objects, but only  periferally
concerned with shapes. 

        In  this paper  we present a design (BE CONCISE !!
"organizational structure" is too many letters)
for verification vision and describe a
system which has carried out the  task of visually locating a bolt  hole
in a  brake assembly and  visually servoing a bolt  into the hole.
The brake assembly's
initial location was known to within  plus or minus 10mm for
both X and Y,  and plus or minus  10 degrees rotation about  is center. 
The location of  the hole involved predicting and  locating curves.  The
servoing is done in a stop-and-go fashion.  That is, the arm is moved and
stopped, a pair of stereo pictures were taken, a relative arm correction
was computed, and the arm is moved again. 

         The  next  section  describes  our  theory of  computer  vision
and  shows how  verification vision  fits  into this  theory.
That  section also  characaterizes  some of  other  the previous  vision
research.   The following sections use the task mentioned above to guide
the description of the current implementation of our verification vision
system. 

2. Simple Example

	For example, one likely task of a robot assembly line factory
worker might be to fasten a screw into a hole. 

3. Image Anaylsis
	3.1 description of Stanford video hardware 
	3.2 description first level processing - edges, curves, regions, CRE, etc.


	Experimental work in computer vision must deal with
existing image aquistion hardware. At Stand
---------------------------------------------------------------------
II.  Elements of Verification Vision: Prediction, Comparison, Correction.

5. Prediction
		Prediction by simulation
		Prediction by training

6. Comparison
		Comparison of video images by Correlation
		Comparison of mosaic images.

7. Correction
		Correcting the camera model.
		Correcting the world model.

DISCLAIMER:

The task  was designed to  point out the  various types  of knowledge
available  and to demonstrate  a system design  which is sufficiently
general to take  advantage of  such knowledge.   In particular,  this
task was not designed to be the "best" way of accomplishing the task,
but  rather as A way  with an available  hardware configuration.  For
example, narrow angle  stereo is probably  more generally useful  for
this  type of verification,  because fewer  features change  from one
view to the next. 
GOAL: INSERT THE BOLT(S) INTO THE BRAKE SUBASSEMBLY
(actual picture of setup)

"SITUATION" (assumptions listed: general→specific)
	programmable assembly environment ... means that there are
		cameras, arms, vises, lights, etc. under computer
		control ... which in turn means that the environment
		is predictable (eg. the lighting)
	one arm ... well calibrated, absolute (within 6.0mm) and repeatable
		(within 1.5mm)
	two cameras ... well calibrated aspect ratio, AR, (within ****)
		and focal ratio, FR, (within ****) ... plus roughly
		calibrated work station → camera transform (within ****)
	lighting ... located at position(s) ... and fixed
	bolt dispensor ... in a fixed location and able to dispense bolts
		within tolerances 1mm x 1mm x .1mm .......... which means
		that the arm (using the repeatability tolerance, can
		pick up a bolt within 2.5 mm etc.
**** CAN FAKE IT ... JUST PUT THE BOLT IN THE HAND ****
	brake subassembly ... upright, positioned at (X0,Y0) (satisfying
		the constraints: -10mm ≤ (actual - X0) ≤ +10mm and
		-10mm ≤ (actual Y - Y0) ≤ +10mm ... and the rotation about
		its center is +- 10 degrees) ... these are realistic
		tolerances resulting from a UNIMATE placing the subassembly
		at the desired position at the workstation


"TASK" 
   STEPS TO BE TAKEN FOR EACH SUBASSEMBLY (during a production run)

(Actual and Synthetic pictures  of the task)

		use prediction to			have the arm pick
		locate the hole				   up a BOLT 
		with CAMERA 1					|
		      |
		      ↓						|
		compute an estimate
		for the 3-D change				|
		from the prediction (& SUPPORT)
		      |						|
		      ↓						
		use CAMERA 1's estimate				|
		to locate the hole
		with CAMERA 2					|
		      |
		      ↓						|
		compute the 3-D
		change from predicted				|
		to actual (using stereo
		to compute 3-D location)		    /

				\			/

				    use 3-D change to
				    correct the destination
				    of the bolt
					   |
					   ↓
				    have the arm move the
				    bolt to this position
					   | ← _________________________________
					   ↓					↑
				    use best hole position to			|
				    locate the bolt with
					CAMERA 1				|
					   |
					   ↓					|
				    use arm's Z to compute
				    an estimate for the				|
(same here)			    3-D position of the bolt
					   |					|
					   ↓
				    use this estimate to			|
				    locate the bolt with 
					CAMERA 2				|
					   |
					   ↓					|
				    use stereo to compute
				    3-D location and correction			|
				    for the next arm move
					   |					|
					   ↓
				    move the arm appropriately			|
				    if force sensors indicate that
				    the bolt hit the side of the hole,		|
				    stop ... if vision indicates that
				    the screw is in the	hole			|
				    (by 3-D position, occlusion,...),
				    stop   |					|
					   |					|
					   ↓____________________________________↑


Proposed task continued

(Actual picture of brake subassembly and bolt)
(Predicted picture of brake subassembly and bolt)

MODELLING, TRAINING, CALIBRATION, AND PROGRAMMING THE STRATEGY

	GEOMED models of the brake subassembly and bolt
including curves (possibly just circles projecting
into 2-D ellipses ... want to be able to find out
the 3-D points that correspond to the points in the
2-D projection ... this would mean that one could
find out what points belong to a curve and simply
fit a curve thru them ...) ... these models also
should have photometric info (to generate synthetic
pictures and roughly estimate the contrast across
edges etc.) ... seems possible to automatically generate
the WALTZ labelling for all of the lines and curves if
they aren't too complicated ... this would be a help
for the "characterization" stage in "training"

**** NEED CURVE EXTENSION TO GEOMED ... IS THERE CURRENTLY
			     PROVISION FOR PHOTOMETRIC INFOR??? HOW ABOUT THE
			     2-D REFERENCE BACK TO THE CORRESPONDING 3-D POINT???
			     TO DO THE WALTZ LABELLING NEED TO KNOW WHY A LINE
			     APPEARS IN A 2-D DRAWING (CONCAVE EDGE, CONVEX,
			     ONE SURFACE OCCLUDING ANOTHER, A SHADOW, CRACK, ETC.)
			     IS THAT TYPE OF INFO RETRIEVABLE??? ****


		TRAINING ... step thru the procedure given above ... interactively
			keeping the internal model of what is going on in 
			synchonization with the actual situation (as viewed by
			the cameras and monitored by the arm)

			TRAINING essentially consists of using the models to
			predict what will be seen, taking pictures to get what
			is actually seen, and updating (extending) the model
			so that it makes better predictions ...

			TRAINING (potentially) produces a number of things:
				actual pictures of an example assembly
				    **** FOR VIDEO COMPARE (CORRELATION) ****
				final calibration of the two cameras with respect
				    to each other and the work station
(compare syn w/ act.)		final photometric calibration (light levels etc.)
				    **** ACTUAL USE MAY BE LIMITED TO 
				    RANGE OF CONTRAST, ETC. ****
				characterizations of the features, eg. the contrast
				    across an edge, the confidence of finding the
				    best correlation for a certain patch, etc.
				    **** FOR TOPOLOGICAL COMPARE ****
(diagram showing 		estimates as to how accurate the implications
implied position of		    are which reduce the tolerances between where
curve and reduction		    a feature is expected and where it might be
of tolerances)			    (eg. how beneficial is it to have an edge
				    point on curve 6 ... what reduction in
				    tolerances can be made) ... also should point
				    out any possible confusing edges, correlations,
				    etc.
				    **** TOLERANCES ARE IMPORTANT FOR DETERMINING
				    WHICH TECHNIQUES TO USE ... FOR OBJECT POSITION,
				    CAMERA POSITION, LIGHT LEVELS, POSSIBLE
				    OCCLUSIONS, ETC. ... THE SYSTEM WILL PROBABLY
				    USE ONLY RECTANGLES (IN 2-D) TO REPRESENT
				    THE TOTAL ALLOWABLE FLUCTUATION ... TAYLOR
				    HAS SOME FANCIER THINGS WHICH MAY BE USEFUL
				    ... OR AT LEAST POSSIBLY PRETTY ENOUGH TO SHOW
				    IN A DIAGRAM OR TWO ...****

		    LOCATE THE HOLE WITH CAMERA 1
			Position the subassembly at (X0,Y0) and aim camera 1
			as desired ... to a "known" position

(pic showing		Use these positions to produce the expected view (using
overlay of pred.	hidden line elimination, curves, etc. to first produce
on actual)		a line drawing ... then a synthetic picture ... and
			finally as much of the Waltz-like information as possible)
			
			Use this expected view (mosaic +) to automatically
			locate the desired features (possibly altering the expected
			curves or the portion in the 3-D model which projects into
			that curve ???) and extract the characterization of the
(maybe diag		features ... probably will have to be interactive as
showing			opposed to completely automatic ... however, since training
adjustment)		is only done once, it seems ok if more time is required to
			to do large searches to find the features ... hopefully
			the information  gained will reduce the amount of this
			searching at run-time.
			**** ADJUSTING COMES IN TWO FORMS, AT LEAST, (1) MOVING
			AROUND IN THE 2-D PICTURE TO FIND THE APPROPRIATE MATCHING
			POINT AND (2) MODIFYING THE RELATIVE TRANSFORM BETWEEN
			THE CAMERA AND THE SUBASSEMBLY ... ESSENTIALLY THE IRON-
			TRIANGLE WORK ****

(diag with		At this point the system could roughly rank the features
possible feat		according to (1) how easily they can be found (eg. large
ranked by		and with contrast) and (2) how beneficial it would be to
cost/benefit?)		find it (eg. what reduction in tolerances might be
			expected)
			**** CURVES COULD BE RANKED BY LENGTH AND CONTRAST 
			PLUS THEIR CURVATURE ... THE MORE CURVATURE, THE BETTER
			IMPLICATIONS ONE CAN MAKE ABOUT WHERE YOU ARE ON THE CURVE,
			CORRELATIONS BY SIZE AND THE DISTINCTIVENESS OF THEIR
			AUTOCORRELATIONS (OR WHATEVER)  ... REALLY ONLY USED
			TO GIVE THE PROGRAMMER HELPFUL HINTS AS TO THE GOODNESS
			OF THE VARIOUS FEATURES ... AND AS DEMO OF A STEP TO
			COME IN AUTOMATIC STRATEGY PRODUCTION ****

(diagram from thesis)

			Having located the various features (a number of which
(diag from		will be correlation points) the `iron triangle' method
BGB thesis)		can be used to determine the transform between camera 1
			and the subassembly ... possibly a version of this 
			`calibration' could be set up which takes more than
			three matching points ... ie. overdetermined system
			**** WHAT IS THE STATE OF THE "IRON TRIANGLE" METHOD?
			IS THERE ANY REASON TO TRY A FIT-WHEN-OVERDETERMINED
			VERSION OF IT???   ANY IDEA HOW ACCURATE IT IS??? ****

			The same training steps can be taken for camera 2 ... 

			So far, only one position of the subassembly has been
			considered.  In order to write a program to locate the
			hole anywhere within the allowable tolerances (on X, Y,
(show 2 hidden		and the rotation about the center Z vector), the system
line draw with		should "look at" the various possibilities and make sure
occlusion)		that a sufficient number of the features will be visible
			etc.  We currently assume that the none of the features
			change significantly ... ie. the shadows don't change
			to interfere with the visual location of features, features
			are not obscured by other parts of the subassembly, etc.
			If such things were possible, the model for the expected
			scene could include explicit alternatives for the
			distinctly different appearances of the object.

			Eventually it would be desireable to have the system
			capable of automatically generating a strategy for locating
			the hole (or whatever is desired).  This would be done
			by simulating the various positions within the tolerances
			and deciding which features can be used to answer
(overlay		which questions about the objects location.  So far,
tolerance box		the various visual location programs have been interactively
on picture)		set up to include a fixed sequence of checks.  Depending
			upon the initial tolerances, various techniques are used
			(eg. the hole location might use a couple of curve location
			steps because the total displacement may be large ... the
			bolt location may only use correlation because the
			tolerances at that point are (hopefully) very small).

			Our system should at least be able to display the possible
			positions (in a picture) for any point of the object.
			This is crucial for deciding upon the strategy.
			**** SEEMS TO BE ESSENTIALLY PUTTING A BOX AROUND THE
			2-D PROJECTIONS OF THE EXTREME POSITIONS ALLOWED WITHIN
			THE TOLERANCES ... EXTREME POSITIONS MAY NOT BE COMPLETELY
			HONEST AND RECTANGLES ARE CERTAINLY NOT GENERAL ENOUGH
			TO TAKE ADVANTAGE OF ALL OF THE INFORMATION, BUT I THINK
			THE IDEA IS CLEAR AND USEFUL ****

(wide angle		To recap:  features will be located in the two pictures,
stereo pic)		matched, and their 3-D position computed.  These 3-D
			positions will be used to compute the transform from
			the planned position to the actual position. 
			**** DO YOU HAVE ROUTINES TO COMPUTE THE 3-D LOCATION
			GIVEN TWO POSITIONS WITHIN 2-D PICTURES ... IE. TO FIND
			THE TWO RAYS IN SPACE AND `INTERSECT' THEM OR AT LEAST
			FIND THE POINT OF CLOSEST APPROACH ... A LA SOBEL??? OR
			OTHERS??? ****


		LOCATE THE BOLT
			The same process can used to set up the program for
			locating the bolt.  Remember that there are two distinct
			steps possible (1) locate the bolt while it is poised
			over the hole (the vision is not as time critical since
			the bolt is not moving) and (2) track and servo the
			bolt in the hole ... very time critical ... our system
			might attempt this ??? ... or do things stop and go???
			Stereo is important at this stage because there isn't
			the support hypothesis to determine the actual 3-D 
			positions from 2-D picture location (as there was
(diag showing		for locating the hole).  There is, of course, the arm's
x,y change for z)	measurement of Z, but 1→6mm off in Z makes quite a change
			in X and Y because of the angle of the cameras ... 
			**** DO YOU REALLY INTEND TO DYNAMICALLY SERVO THE
			BLOODY ARM ???  IT CERTAINLY SEEMS FEASIBLE IF THE
			LOCATION OF THE BOLT CAN BE DONE BY A FEW CORRELATIONS
			OR SOME SUCH THING ... THERE ARE REAL DYNAMIC PROBLEMS
			THOUGH ... EG. HOW TO GIVE DELTA CHANGES TO THE ARM
			ESPECIALLY SINCE ANY CORRECTION WILL HAVE TO INCLUDE
			A PREDICTION OF WHERE THE ARM PROGRESSED TO WHILE THE
			MACHINE WAS TRYING TO FIND THE BOLT IN THE PICTURE 
			... ****
ORDER OF IDEAS?

Main points (for abstract):
	A system organization for VV ... steps, models and tolerances
	Automatic prediction of `features' ... including curves
	Training ... including the location and description of curves
	Location and comparison ... simple `fixed' strategy ... use 1 to find 2nd
	Correction ... mathematics for transforms, 2-D ↔ 3-D etc. 
    Future:
	Fancier features & tolerances ... eg. auto correl pred from 3-D, Waltz...
	Fancier automatic location of features ... confidence level, 3-D compare mod
	Fancier automatic strategy development ...2-D, 3-D, tolerance simul
		... modelling relative motion
	Fancier math for arbitrary axis of rotation ... 

Introduction
	Definition ... not yes/no ... not recognition
		... predict ... compare ... correct
	covers (1) hypoth and test (2) stereo (3) pred environment ...
	example ... screw in hole (monocular) 
	    =>  (1) predictable w/i tolerances ... impt. tolerances
		(2) many types of info ...
	in the past there have been some "special purpose" vision hacks (pump)
		want to "systemetize and automate" as much as possible
		... many applications: automation, cart, ... (pump paper for back)
		... also we believe there exists a sufficiently
			interesting set of low-level operators 
	Our approach to paper ... "theory and grand acheme" plus history section 2
		... sections on what we have done ... trying to keep speculation
		to a min ... and then in the conclusion come back to the grand
		scheme and bring things together ... and explore the next
		extensions and their potential (& difficulty)
VV SYSTEM Organization (vision in general ... some related work...)
	"Vision Mandala"
	Notice this is indepedent of control structure ... top-down, bottom-up,
		heteracrchy, ... VV is, almost by definition, top-down
		... roughly characteristics which determine top-down vs. bottom-up
	Elements of vision representation
		2-D image rep:
			"raw data": video, depth
			"raw feature pictures": edge, contour, ...
			"interpreted features": lines, corners, curves,
		3-D image rep:
			geometric, space, ... good grief
			surface photometry ...
			physics (support)
		special task rep:


	point out how others fit into this scheme ... Roberts, Falk, Waltz, Krakaur
		ROBERTS ... parameterized models ... pic, edge, lines & polygons,
			topol match to model, pick "best" transform from model to
			data, uses support to determine final 3-D position
		regions GARVEY ... & SRI PROGRESS REPORT ... 
			YAKIMOVSKY, AND LIEBERMAN
		correlation QUAM ... MARSHA JO
		blocks GUZMAN FALK WALTZ GRAPE GILL PERKINS PERKINS
		contours KRAKAUR BAUMGART
		hidden line ... WATKINS ...
		graphics ... GOURAUD, (latest Utah) 

	Fit VV in ... point out levels possible ... give some tradeoffs and reasons
	    for dealing at each level ...
	Fit in a "grand scheme" and then show "actual scheme ... in pieces"




Task accomplished ... in peices
	purpose ... demonstrate ...
	relationship to the "grand scheme"
        Need to describe Stanford's system to put the various existing pieces of the
	        system in perspective (so to speak)
	diagram of steps



Prediction
	Goal: predict view (eventually whole movie) ... maybe just beginning for
		interactive system 
	model ... 3-D geometric + photometry (GEOMED)
	hidden line elim => mosaic with photometry, links to 3-D, and "descriptions"
	"descriptions" like Waltz ...
	example: circle with obscuring plane in front of it ... approx by lines,
		show "labelling" and info given to characterizer ... with why and
		how ...



Training
	Goal: "second calibration step" of the models (geometric, photometric,...)
		... the first step is  the initial model ... a third step might
		be the "calibration" from one picture of a sequence to the next
		(eg. following the bolt into the hole ... slightly different
		for each assembly)
	logically is another VV problem, but one-shot so less time-dependent
		ie. it uses prediction, comparison, and correction 
		the corrections are different ... updating camera vs. object pos
                Another distinction:  almost necessarily interactive  to
                insure the  correct points (features) are matched up ...
                "under a teacher's eye"
		... described here, because this is its position within a task
	Benefits of training:









Comparison
	Goal of compare: match points of model with points in picture (or features
		more generally)
	Currently sort of "fixed" strategy ... use big curves until narrowed down
		tolerances well enough to use expensive correlations
		model used ... and dynamically changes as comparison progresses
	Manual override if confusing curves possible, etc. ... not very good ..alt
	with curves ... cost/benefit idea
		costs used ... cost of edge op, correl, #expected, etc. benefit?
	Conservative ... works like ... step thru example
	Model for curves is 2-D (in image)... for correl is too, but wrt to table
	future automatic strategies (spec)
Correction
	Goal of correction: determine a improved estimate for an objects position
		could be relative to some other object (as in our case: bolt tip
		wrt to the hole) or "sort of absolute" (ie. wrt workstation coords)
	model ... stereo ... relative change for arm
Applications 
	Automatic assembly
	Cart
		one way of looking at this is that a cart with a map of the 
			road, plus possibly contours, has to do more "revelation"
			vision, but as it progresses, it can do verification
			vision ... training could be a previous trip along the same
			road ... in some sense the relative motion problems are
			different (screwdriver ... camera stays still, screwdriver
			moves ... with the cart ... the world stays still (more
			or less) and the camera moves ... )
		a smart cart (everyone ought to have one) should also do recognition
			visions ... for cars, cross streets, ...


Conclusion
	future, future, future, ... I see & I see => I am  ⊃ I am (sort of)
INTRODUCTION

       Verification vision  can be roughly  described as the  process of
matching  a predicted image or  model with an actual  image when the two
are nearly the same.  In particular, the identities of the elements (eg. 
bolt,  corner, or  correlation patch)  are not  in question;  only their
relative  positions (and possibly the  confidences associated with these
positions). 

       Verification vision has  been used in  various ways in  the past.
Possibly  the most common  is "hypothesis  and test." For  example, some
higher level procedure proposes  a certain line  (limited to a range  of
angles)  at  a certain  place  in  a  picture (within  tolerances);  the
comparison  step is  supposed to locate  the line,  if it  is there, and
return the  matching angle  and position (see  [FALK], [SHIRAI],  ...). 
Another place  verification vision has been used  is within narrow-angle
stereo programs.  In this case the "prediction" is another image of  the
same objects,  but taken from  a slightly different  relative position. 
The goal is  to locate matching "features" (such as correlation patches)
in order to provide the stereo  package with two positions for the  same
part of the scene (see [Thomas]).   Notice, as mentioned above, that the
identities of the models (ie. the line and correlation patch) are not in
question; only their positions in the actual image. 

       More  recently there  has been  considerable  interest in  visual
perception within a  programmable assembly system.  Such systems provide
complex, but predictable environments consisting of objects with curved,
textured surfaces.  There have been a few special-purpose programs which
perform  verification vision  tasks within  such environments (eg.   see
[BOLLES], [ROSSOL],  ...), but  there have been  no generalized  systems
which predict  and locate curved objects.   Garvey and Agin  at SRI have
each set up systems which deal  with real objects, but only  periferally
concerned with shapes. 

        In  this paper  we present  an organizational structure  (theory
sounds too comprehensive) for verification vision and describe a partial
system which has carried out the  task of visually locating a bolt  hole
in a  brake assembly and  visually servoing a bolt  into the hole.   The
assembly's  initial location was known to within  plus or minus 10mm for
both X and Y,  and plus or minus  10 degrees rotation about  is center. 
The location of  the hole involved predicting and  locating curves.  The
servoing was done in a stop-and-go fashion.  That is, the arm moved  and
stopped, a pair of stereo pictures were taken, a relative arm correction
was computed, and the arm moved again. 

         The  next  section  describes  our  theory of  computer  visual
perception and  shows how  verification vision  fits  into this  theory.
That  section also  characaterizes  some of  other  the previous  vision
research.   The following sections use the task mentioned above to guide
the description of the current implementation of our verification vision
system.